Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We study (differentially) private federated learning (FL) of language models. The language models in cross-device FL are relatively small, which can be trained with meaningful formal user-level differential privacy (DP) guarantees when massive parallelism in training is enabled by the participation of a moderate size of users. Recently, public data has been used to improve privacy-utility trade-offs for both large and small language models. In this work, we provide a systematic study of using large-scale public data and LLMs to help differentially private training of on-device FL models, and further improve the privacy-utility tradeoff by techniques of distillation. Moreover, we propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution, which significantly improves the sample efficiency of (pre-) training on public data. The proposed method is efficient and effective for training private models by taking advantage of public data, especially for customized on-device architectures that do not have ready-touse pre-trained models.more » « less
-
Graph Neural Networks (GNNs) are neural models that leverage the dependency structure in graphical data via message passing among the graph nodes. GNNs have emerged as pivotal architectures in analyzing graph-structured data, and their expansive application in sensitive domains requires a comprehensive understanding of their decision-making processes — necessitating a framework for GNN explainability. An explanation function for GNNs takes a pre-trained GNN along with a graph as input, to produce a ‘sufficient statistic’ subgraph with respect to the graph label. A main challenge in studying GNN explainability is to provide fidelity measures that evaluate the performance of these explanation functions. This paper studies this foundational challenge, spotlighting the inherent limitations of prevailing fidelity metrics, including Fid+, Fid−, and Fid∆. Specifically, a formal, information-theoretic definition of explainability is introduced and it is shown that existing metrics often fail to align with this definition across various statistical scenarios. The reason is due to potential distribution shifts when subgraphs are removed in computing these fidelity measures. Subsequently, a robust class of fidelity measures are introduced, and it is shown analytically that they are resilient to distribution shift issues and are applicable in a wide range of scenarios. Extensive empirical analysis on both synthetic and real datasets are provided to illustrate that the proposed metrics are more coherent with gold standard metrics. The source code is available at https://trustai4s-lab.github.io/fidelity.more » « less
-
Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either assume full client participation or partial client participation where the clients can be uniformly sampled. However, in practical cross-device FL systems, only a subset of clients that satisfy local criteria such as battery status, network connectivity, and maximum participation frequency requirements (to ensure privacy) are available for training at a given time. As a result, client availability follows a natural cyclic pattern. We provide (to our knowledge) the first theoretical framework to analyze the convergence of FedAvg with cyclic client participation with several different client optimizers such as GD, SGD, and shuffled SGD. Our analysis discovers that cyclic client participation can achieve a faster asymptotic convergence rate than vanilla FedAvg with uniform client participation under suitable conditions, providing valuable insights into the design of client sampling protocols.more » « less
-
Alloying in two-dimensional (2D) transition metal dichalcogenides (TMD) has allowed bandgap engineering and phase transformation, which provide more flexibility and functionality for electronic and photonic devices. To date, many ternary TMD alloys with homogenous compositions have been synthesized. However, realization of bandgap modulation spatially within a single TMD nanosheet remains largely unexplored. In this work, we demonstrate the synthesis of spatially composition-graded WSe2xTe2-2x flakes using an in situ chemical vapor deposition method. The photoluminescence and Raman spectra line-scanning characterization indicate a spatially graded bandgap, which increases from 1.46 eV (center) to 1.61 eV (edge) within one monolayer flake. Furthermore, the electronic devices based on this spatially graded material exhibit tunable transfer characteristics.more » « less
-
The rapid growth of GPS technology and mobile devices has led to a massive accumulation of location data, bringing considerable benefits to individuals and society. One of the major usages of such data is travel time prediction, a typical service provided by GPS navigation devices and apps. Meanwhile, the constant collection and analysis of the individual location data also pose unprecedented privacy threats. We leverage the notion of geo-indistinguishability, an extension of differential privacy to the location privacy setting, and propose a procedure for privacy-preserving travel time prediction without collecting actual individual GPS trace data. We propose new concepts to examine the impact of the geo-indistinguishability sanitization on the usefulness of GPS traces and provide analytical and experimental utility analysis for privacy-preserving travel time prediction. We also propose new metrics to measure the adversary error in learning individual GPS traces from the collected sanitized data. Our experiment results suggest that the proposed procedure provides travel time analysis with satisfactory accuracy at reasonably small privacy costs.more » « less
An official website of the United States government

Full Text Available